System, method and a computer readable medium for providing an output image

ABSTRACT

A method for providing an output image, the method includes: determining an importance value for each input pixels out of multiple input pixels of an input image; applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Reissue of U.S. patent application Ser. No.12/597,036, filed May 11, 2011, now U.S. Pat. No. 8,718,333, which is aNational Phase Application of PCT International Application No.PCT/IL2008/000528, entitled “SYSTEM, METHOD AND A COMPUTER READABLEMEDIUM FOR PROVIDING AN OUTPUT IMAGE”, International Filing Date Apr.17, 2008, published on Oct. 30, 2008 as International Publication No. WO2008/129542, which in turn claims priority of U.S. Provisional PatentApplication Ser. No. 60/913,301, filed Apr. 23, 2007, both each of whichare incorporated herein by reference in their entirety.

BACKGROUND OF THE INVENTION

With the recent advent of mobile video displays, and their expectedproliferation, there is an acute need to display video on a smallerdisplay than originally intended. Two main issues need to be confronted.The first is the need to change the aspect ratio of a video. The secondis the need to down-sample the video whilst maintaining enoughresolution of objects-of-interest. An example of the first challenge isthe display of wide screen movies on a 4:3 TV screen. Displaying a ballgame on a cellular screen is a good example for the need of a smartdown-sampling technique, where the ball needs to remain large enough tobe easily seen on screen.

The current industry solutions are basic and not very effective. Theyinclude: blunt aspect ratio free resizing; cropping the middle of thevideo; resizing while preserving the aspect ratio by adding blackstripes above and below the frame; and keeping the middle of the frameuntouched while warping the sides. In fact, it is common nowadays tohave printed lines on movie-cameras' screens that mark the region thatwill be visible in the frame after it would be cropped to the aspectratio of a regular 4:3 TV screen.

There is a growing need to provide effective devices and method forimage transformation.

SUMMARY OF THE INVENTION

A method for providing an output image, the method includes: determiningan importance value for each input pixels out of multiple input pixelsof an input image; applying on each of the multiple input pixels aconversion process that is responsive to the importance value of theinput pixel to provide multiple output pixels that form the outputimage; wherein the input image differs from the output image.

A device for providing an output image, the device includes: a memoryunit adapted to store an input image and a processor, adapted to:determine an importance value for each input pixels out of multipleinput pixels of an input image and apply on each of the multiple inputpixels a conversion process that is responsive to the importance valueof the input pixel to provide multiple output pixels that form theoutput image; wherein the input image differs from the output image.

A computer readable medium that stores instructions for: determining animportance value for each input pixels out of multiple input pixels ofan input image; and applying on each of the multiple input pixels aconversion process that is responsive to the importance value of theinput pixel to provide multiple output pixels that form the outputimage; wherein the input image differs from the output image.

A method for providing an output image, the method includes: receivingan input frame and a feature mask defined by a rough selection of thefeatures; applying a mapping process to provide the output image;wherein the mapping process differentiates between pixels of featuresincluded in the feature mask and between other pixels; wherein theapplying comprises solving a sparse equation set.

A device for providing an output image, the device includes: a memoryunit adapted to store an input image and a feature mask defined by arough selection of the features; a processor, adapted to: apply amapping process to provide the output image; wherein the mapping processdifferentiates between pixels of features included in the feature maskand between other pixels; wherein the applying comprises solving asparse equation set.

A computer readable medium that stores instructions for: receiving aninput frame and a feature mask defined by a rough selection of thefeatures; applying a mapping process to provide the output image;wherein the mapping process differentiates between pixels of featuresincluded in the feature mask and between other pixels; wherein theapplying comprises solving a sparse equation set.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a illustrates a system according to an embodiment of theinvention;

FIG. 1b illustrates a system according to another embodiment of theinvention;

FIG. 2 illustrates a flow chart of a method according to an embodimentof the invention;

FIG. 3 illustrates a flow chart of a method according to anotherembodiment of the invention;

FIG. 4 illustrates a flow chart of a method according to an embodimentof the invention;

FIG. 5 illustrates a flow chart of a method according to an embodimentof the invention;

FIGS. 6a-6b, 7a-7h, 8a-8c, 9a-9d, 10a-10f, 11a-11l are input and outputimages;

FIGS. 12a-12c, 13a-13b, 14a-14d, 15a-15e, 16a-16c, 17a-17c, 18a-18d,19a-19b are input and output images; and

FIG. 12d includes a series of images that illustrate various stages of amethod according to an embodiment of the invention.

DETAILED DESCRIPTION OF THE DRAWINGS

The terms “frame” and “image” have the same meaning. Each means a set orgroup of pixels. Multiple frames or images can form a shot, one or moreshots can form a video stream.

Video retargeting is the process of transforming an existing video tofit the dimensions of an arbitrary display. A compelling retargetingaims at preserving the viewers' experience by maintaining theinformation content of important regions in the frame, whilst keepingtheir aspect ratio.

An efficient method for video retargeting is introduced. It includes twostages. First, the frame (input image) is analyzed to detect theimportance of each region (or multiple input pixels) in the frame. Then,a transformation (conversion process) that respects the analysis shrinksless important regions more than important ones. The analysis is fullyautomatic and based on local saliency, motion detection, and objectdetectors. The performance of the proposed method is demonstrated on avariety of video sequences, and compared to the state of the art inimage retargeting.

A method is provided. The method assigns a saliency score to each pixelin the video. An optimized transformation of the video to a downsizedversion is then calculated that respects the saliency score. The methodis designed to work efficiently in an online manner, ultimately leadingto a real-time retargeting of a streaming input video to several outputformats. The saliency score is composed of three basic components:spatial gradient magnitude, a face detector (or another object ofinterest detector), and a block-based motion detector. The optimizationstage amounts to solving a sparse linear system of equations. Itconsiders spatial constraints as well as temporal ones, leading to asmooth temporal user experience. It is noted that the method can beapplied in off-line, with the advantage of analyzing the entire shot.

Conveniently, a computer readable medium is provided. The computerreadable medium can be a diskette, a compact disk, a disk, a tape, amemory chip and the like and it stores instructions for: determining animportance value for each input pixels out of multiple input pixels ofan input image; and applying on each of the multiple input pixels aconversion process that is responsive to the importance value of theinput pixel to provide multiple output pixels that form the outputimage; wherein the input image differs from the output image. Thecomputer readable medium can store instructions for executing any stageof method 100.

Given a new frame, the method computes a per-pixel importance measure.This measure is a combination of three factors: a simple, gradientbased, local saliency; an off-the-shelf face detector; and a high-endmotion detector.

The optimization of the mapping function from the source resolution tothe target resolution is set through a linear system of equations. Eachpixel (i,j) at each frame t is associated with two variable x_(i,j,t),y_(i,j,t) that determine its location on the retargeted frame. Themethod optimizes for horizontal warps and for vertical warps separately,using the same technique. The horizontal post-warp location is firstconstrained to have the same coordinates as the warp of the pixel justbelow it x_(i,j+1,t), and the pixel just before it x_(i,j,t−1). Then itis constrained to have a distance of one from the warping of its leftneighbor x_(i−1,j,t).

For obvious reasons, it is impossible to satisfy all of the constraintsand yet fit into smaller retargeting dimensions. To thesespace-preserving constraints, a weight was added in proportion to thepixel's importance value. A pixel with high importance is mapped to adistance of approximately one from its left neighbor, while a pixel ofless saliency is mapped closer to its neighbor. Time smoothness is alsotaken into consideration, in order to generate a continuousnatural-looking video.

The method is designed for video streaming. Therefore, time smoothnessand motion analysis considerations are limited to the previous framesonly. Such considerations need only apply to frames of the same shot (asequence of video frames taken from a continuous viewpoint). It is notedthat the method can be applied in off-line and thus can take intoaccount multiple images of an input image sequence. It is noted that themethod can be extended to an off-line method; thus, it can incorporatethe advantages of off-line video analysis. This includes smooth timeanalysis that computes the mapping of each frame (image) based on anarbitrary number of frames before or after it, and better, more accuratemotion-detector, shot-detector and object-detectors that can incorporateinformation from several frames at once.

The proposed method automatically breaks a long video into a sequence ofshots using a simple online method, similar to the one shown in Meng, Y.Juan, and S.-F. Chang. Scene change detection in an MPEG-compressedvideo sequence; Digital Video Compression: Algorithms and Technologies,1995 where the block matching operation is replaced with the efficientmethod of Lu and M. Liou. A simple and efficient search algorithm forblock-matching motion estimation. IEEE Trans. Circuits and Systems,1997. This combination is efficient, robust, and uninfluenced by objectand camera motion. First, motion estimation is applied on eachmacroblock (16×16 pixels). A shot boundary is detected wherever thenumber of blocks for which the motion estimation fails exceeds athreshold. It is noted that the proposed method can be replaced withother “Shot detection” mechanisms, including off-line shot detectionmechanisms.

Importance Determination

A content preservation weight matrix is defined:

$\begin{matrix}{S = {\min\left( {{S_{E} + {\sum\limits_{i}\; S_{F}^{i}} + S_{MD}},1} \right)}} & (1)\end{matrix}$

Each entry in the matrix represents the saliency of a single pixel inthe source frame I. Values range between 0 and 1, where zero values are,content wise, non-important pixels. It is noted that other saliencyfactors can be added into this formula, for example, the output of moreobject detectors, or output of background regions. These factors neednot only be combined using a linear function but also can be implementedusing a more complex function such in the case of a probabilitycomputation.

Local Saliency

Various local saliency algorithms can be applied. For example, a simplemeasure for local information content in the frame such as the L₂-Normof the gradient is applied:

$S_{E} = {\left( {\left( {\frac{\partial}{\partial x}I} \right)^{2} + \left( {\frac{\partial}{\partial y}I} \right)^{2}} \right)^{1/2}.}$It is noted that the local saliency function can be replaced with otherenergy function such as L₁-Norm or with the output of local saliencydetectors.Face Detection

According to an embodiment of the invention, the local saliencyalgorithm is based upon wavelet transformation. The wavelet breaks theframe into its “wavelets”, scaled and shifted versions of a “motherwavelet.” These scaled and shifted versions can be calculated byapplying high pass filters and low pass filters. These scaled andshifted versions of the frame are associated with levels. Higher levelsare referred to as coarser levels and include fewer details. Each levelcan include few (such as three) diagonal high frequency frame version(generated by applying a high pass filter on a previous level frameversion in few directions such as, corollary horizontal, vertical anddiagonal) and a low frequency frame version (generated by applying a lowpass filter on a previous level frame version).

Conveniently, the local saliency algorithm includes: (i) waveletdecomposing a frame into multiple (N) levels; (ii) locating the coarsestdiagonal high frequency frame in which the percentage of waveletcoefficients having values below a first threshold is below a secondthreshold; (iii) thresholding the coarsest diagonal high frequency frame(using the first threshold) to provide a binary frame (in which a bitcan be set if the corresponding wavelet coefficient is above thethreshold and can be reset of the corresponding wavelet coefficient isbelow the threshold); (iv) re-sizing the binary frame to the size of theinput image; (v) smoothing the re-sized binary frame (for example byapplying a Gaussian blur filter) to provide a saliency score per pixelof the input frame. It is noted that the high frequency diagonal framescan be HH frames (generated by a both horizontal and verticla high passfilters), that the locating stage can start from the N'th level andpropagate upwards (to less coarse frames) till reaching the firstdiagonal high frequency frame that in which less then 50% of the waveletcoefficients have a value that is below the first threshold.

Human perception is highly sensitive to perspective changes in faces,more specifically to frontal portraits. In order to avoid deformingfrontal portraits the Viola and Jones face detection mechanism wasapplied. (P. Viola and M. Jones. Robust real-time face detection.International Journal Computer Vision, 2004.)

The detector returns a list of detected faces. Each detected face i hasa 2D center coordinate F_(p) ^(i) and a radius F_(r) ^(i). The facedetection score of each pixel is a function of the distance of thatpixel from the face's center: D_(i)(x, y)=PF_(p) ^(i)−(x, y)P₂, and isgiven by the cubic function:

$\begin{matrix}{{{\hat{S}}_{F}\left( {x,\; y} \right)} = {\max\left( {{1 - \frac{{- {D_{i}\left( {x,\; y} \right)}^{3}}\; + \;{{.5}*{D_{i}\left( {x,\; y} \right)}^{2}}}{{- \left( F_{r}^{i} \right)^{3}}\; + \;{{.5}*\left( F_{r}^{i} \right)^{2}}}},0} \right)}} & (2)\end{matrix}$

This function, which ranges between 0 and 1, is used to weight theimportance of the face as an almost constant function with a drasticfall near the end of the face. This allows some flexibility at the edgesof the face whilst avoiding face deformation. It is noted that the abovefunction can be replaced with other weight function such as linear orsquare functions.

A rescaling measure can be provided. It has the following format:

$\begin{matrix}{F_{rn}^{i} = {{\frac{F_{r}^{i}}{\max\left( {C_{Width},\; C_{Height}} \right)}\;{S_{F}^{i}\left( {x,y} \right)}} = \;{{{\hat{S}}_{F}^{i}\left( {x,y} \right)}\;\left( {1 - {2.5*\left( F_{rn}^{i} \right)^{4}} - {1.5*\left( F_{rn}^{i} \right)^{2}}} \right)}}} & (3)\end{matrix}$

used to rescale the general saliency of a detected face in relation tothe area it occupies in a C_(Width)×C_(Height) pixels frame. A 1 factoris used where the size of the face is relatively small, while extremelylarge faces tend to be ignored. The above prevents a distorted zoomingeffect, i.e. retargeting of the frame such that it is mostly occupied bythe detected face. It is noted that the above rescaling function can bereplaced with other rescaling function such as a constant function,linear and the like.

Since, as stated below, when shrinking the width of an image, smoothnessover the columns is required, a detected face (detected by a facedetector) also prevents thinning the regions below it. Therefore, humanbodies are shrunk less, as necessitated. It is noted that a specifichuman figure detector can be added/replace the above face detector.

Retargeting examples of a frame (a) from the movie “300” with andwithout face detection. (b) the gradient map, with the faces detectedimposed. (c) the result of retargeting to half the width without facedetection. (d) the result of retargeting with face detection. The resultof the whole shot compared to bi-cubic interpolation is available in thesupplemental material.

Motion Detection

Moving objects in video draw most of the viewers' attention and arecontent-wise important. By using motion detection mechanism video can beretargeted while preserving the temporal context.

The second motion detector suggested in S.-C. Liu, C.-W. Fu, and S.Chang. Statistical change detection with moments under time-varyingillumination. IEEE Trans. on Image Processing, 1998 is implemented. Theselected method is efficient and effective, although little known.

Let the frame be partitioned into N×N (N=8) pixel square blocks andA_(uv) denote the (u, v)th block. The image coordinate (x, y) is inA_(uv) if (u−1)N+1≤x≤uN and (v−1)N+1≤y≤vN. define x′=(x)_(mod N) andy′=(y)_(mod N).

For each block A_(uv), the total intensity of the block at frame t iscalculated:

${A_{l}\left( {u,v} \right)}:={\sum\limits_{x^{\prime} = 1}^{N}{\sum\limits_{y^{\prime} = 1}^{N}{{I_{l}\left( {x,y} \right)}.}}}$Then, the normalized “circular shift moments” in the x and y directionsmx_(t) ^(j)(u, v), my_(t) ^(j)(u, v) are computed for j=0 . . . N−1. Thex moment is formulated: (respectively y)

$\begin{matrix}{{{mx}_{t}^{j}\left( {u,v} \right)} = {{A_{t}\left( {u,\; v} \right)}^{- 1}\;{\sum\limits_{x^{\prime} = 1}^{N}{\left( {x - j} \right)_{{mod}\mspace{11mu} N} \cdot \;{\sum\limits_{y^{\prime} = 1}^{N}\;{I_{t}\left( {x,y} \right)}}}}}} & (4)\end{matrix}$

A motion in block (u, v) is detected if the maximum absolute differencein any of the computed moments between two consecutive frames is largerthan a threshold. i.e., no motion is detected if for all j, |mx_(t)^(j)(u, v)−mx_(t+1) ^(j)(u,v)|<χ and |my_(t) ^(j)(u, v)−my_(t+1) ^(j)(u,v)|<χ. In some tests χ=0.3.

The motion-based saliency S_(MD)(x, y) is set to one if the blockA_((└x/N┘×└y/N┘)) has motion, and zero otherwise. It is noted that theabove motion detector can be replaced by other block-based/pixel-basedmotion detectors, such as the simple “Mean Absolute Difference” blockbased motion detector.

According to another embodiment of the invention the motion basedsaliency of a pixel can be responsive to the amount of motion.Accordingly, the motion based saliency can have weights that rangebetween (for example) zero and 1. Higher motion values will result inlower motion based saliency values. This can include: (i) Using themotion based saliency, to construct a new time weighting matrix, thusachieving more “flexible” solution in high motion areas; (ii) building aweighting matrix that reflects a difference between “1” and the valuesof a motion based saliency matrix; (iii) utilizing this matrix whensolving various constraints.

As can be seen in FIG. 5′ the moving objects gain saliency, thus seizinga larger area in the retargeted video.

Optimization

It is suggested to apply a conversion process that is responsive to animportance of an input pixel. It includes finding an optimal mappingbetween the source image (input image) and the retargeted image (outputimage) and especially solving a sparse linear system of equations,conveniently, by applying a least squares manner solution. A morenatural formalization is to cast the problem as a constrained linearsystem. This way one can guarantee that no pixel falls out of bounds andthat the mapping preserves the order of the pixels along the scan linesin the image. However, the solution to the unconstrained system is moreefficient, and, in practice, the mappings recovered using theunconstrained systems of equations do not contain noticeable artifactsdue to changes in the order of the pixels. It is noted that the “leastsquare” measure used to minimize the cost function can be replaced byany error measuring function, such as the L₁-Norm.

In the retargeting process a pixel (i,j) in frame t of the video isbeing mapped into a pixel in frame t of the output video with somecomputed location (x_(i,j,t), y_(i,j,t)). Hence, there is twice thenumber of variables (x_(i,j,t) and y_(i,j,t)) to solve for then thenumber of pixels in the input video. A computation of the y variablescan be made separately from the computation of the x variables, usingthe same linear method (described below). The mapping computation isdone one frame at a time (see below), and so the system of equations has(approximately) the same number of unknowns as the number of pixels inone input frame.

Consider the problem of recovering the new x-axis locations x_(i,j,t) ofpixels (i,j), i=1 . . . C_(Width), j=i . . . C_(Height), in frames t=1 .. . C_(Duration). The problem of determining y_(i,j,t) is the transposeof this problem and is solved in a similar manner. Also, consider firstthe more applicable problem, in which frame should also be shrieked,i.e., to map the frame to a narrower frame with widthC_(TargetWidth)<C_(Width). The expanding problem is similar, though itsgoal is more application dependent.

There are four types of constraints. First, each pixel to be at a fixeddistance from its left and right neighbors. Second, each pixel needs tobe mapped to a location similar to the one of its upper and lowerneighbors. Third, the mapping of a pixel at time t (current input image)needs to be similar to the mapping of the same pixel at time t+1(previous input image). The forth constraint fits the warped locationsto the dimensions of the target video frames.

Importance Modeling.

If a pixel is not “important” it can be mapped close to its left andright neighbors consequently blending with them. An “important” pixel,however, needs to be mapped far from its neighbors, thus a region ofimportant pixels is best mapped into a region of a similar size. Theseinsights are formulated into equations stating that every pixel shouldbe mapped at a horizontal distance of 1 from its left and rightneighbors. These equations are weighted such that equations associatedwith pixels with higher importance-score are more influential on thefinal solution. The first type of equations is therefore:S_(i,j,t)(x_(i,j,t)−x_(i−1,j,t))=S_(i,j,t)  (5)S_(i,j,t)(x_(i+1,j,t)−x_(i,j,t))=S_(i,j,t),

More precisely, since the at least-squares manner solution is appliedthen an equation arising from a pixel of importance S_(i,j,t) is asinfluential as

$\frac{S_{i,j,t}^{2}}{S_{i^{\prime},j^{\prime},t^{\prime}}^{2}}$equations arising from a pixel of importance S_(i′,j′,t′).

It is noted that S is the saliency matrix of Eq. (1), except the timeindex appears explicitly. Note that the equation looking right frompixel (i−1,j) can be combined with the equation looking left from pixel(i, j) to one equation:(S_(i−1,j,t)+S_(i,j,t))(x_(i,j,t)−x_(i−1,j,t))=(S_(i−1,j,t)+S_(i,j,t))  (6)Boundary Substitutions.

In order to make the retargeted image fit in the new dimensions aconstraint is added that defining the first pixel in each row of theframe (1, j, t) to be mapped to the first row in the retargeted video,i.e., ∀j, ∀t x_(1,j,t)=1. Similarly, the last pixel of each row ismapped to the boundary of the remapped frame: ∀j, ∀t x_(C) _(Width)_(,j,t)=C_(TargetWidth).

Since the mappings of the first and last pixels in each row are known,there is no need to have unknowns for them. Instead, it is substitutedwith the actual values whenever x_(1,j,t) or x_(C) _(Width)_(,j,t)appear in Eq. (6).

Spatial and Time Smoothness

It is important to have each column of pixels in the input image mappedwithin the boundaries of a narrow strip in the retargeted image.Otherwise, the image looks jagged and distorted. These type ofconstraint are weighted uniformly, and take the form:W^(s)(x_(i,j,t)−x_(i,j+1,t))=0  (7)

In the system W^(s)=1. In order to prevent drifting, a similarconstraint is added that states that the first and the last pixels ofeach column have a similar displacement.W^(s)(x_(i,1,t)−x_(i,C) _(Height) _(,t))=0  (8)

The mapping also has to be continuous between adjacent frames, as statedbellow:W_(i,j,t) ^(t)(x_(i,j,t)−x_(i,j,t−1))=0,  (9)

where, in order to prevent distortion of faces, the weighting depends onthe face detector saliency map W^(t)=0.2(1+S_(F)). Note that accordingto an embodiment of the invention in on line more (real time mode) theresources do not necessarily allow to build a system of equations forthe whole shot. Instead mapping is computed for each frame given theprevious frame's computed mapping. This limited-horizon onlinetime-smoothing method and, as illustrated in FIG. 6′ can improve resultssignificantly.

Altering the Aspect Ratio of the Input Image

Examples of aspect ratio altering are exhibited in FIG. 7′ and in otherfigures throughout this manuscript.

The format of the retargeted videos is as follows: each frame is dividedinto three sub frames. The bottom one is the original video frame. Thetop right sub-frame is the result of applying bi-cubic interpolation toobtain a new frame of half the input width. The top-left sub-frame isthe retargeted result.

While the method does not explicitly crop frames, whenever theunimportant regions in the frame lie away from the frame's center, animplicit cropping is created. See, for example, the retargeting resultof the sequence Akiyo (FIGS. 1′a-1′c). Many pixels at the left and rightsides of the input frames are mapped into the first and last few columnsof the retargeted frames, hence disappearing. FIG. 1a illustrates anoriginal frame from the standard benchmark news sequence “Akiyo”, FIG.1b illustrates a half width retargeted frame achieved by applying method100 and FIG. 1c illustrates a half width retargeted frame achieved witha prior art transformation.

Down-Sizing Results

The down-sampling results (preserving the aspect ratio) are exhibited inFIG. 8′.

The x-axis and the y-axis warps were computed independently on theoriginal frame and then applied together to produce the output frames.As can be seen, there is a strong zooming-in effect in our results, asnecessitated by the need to display large enough objects on a smallscreen.

It is noted that by using a global error measuring function (such asleast squares) the solution tends to uniformly distribute the erroracross the whole image, rather than concentrate it locally.

Video Expanding

The method can also be used for video expanding. In such a case,however, the desired output depends on the application. In oneapplication, for stills, the task is to keep the original size of thesalient objects, while enlarging the video by filling-in the lesssalient locations with unnoticeable pixels. For such a task, the methodcan work without any modifications.

In another application, one would like the salient objects to becomelarger without creating noticeable distortions to the video. A relatedtask is foreground emphasis through non-homogenous warping in-place,where the dimensions of the video remain the same (salient objects areincreased in size on the expense of less salient regions). To apply themethod in these cases, we need to alter Equation (6) to have thepreferred inflating ratio on the right-hand-side. If given by the useror by some heuristic, this is a simple modification. For an inflation bya fixed factor of two where the width is increased and the heightremains the same.

According to another embodiment of the invention the device and methodare adapted to compensate for camera motion, camera zoom-out and camerazoom-in. Accordingly, the method can compensate for (or substantiallyignore) motion introduced by camera manipulation and not by an actualmovement of the object. This stage can involve incorporating globalaffine motion registration, into the solution of the optimizationproblem. In such a case the global motion is compensated before theoptimization stage, and added/subtracted from the optimization solution.

According to another embodiment of the invention the method and devicecan be used to convert an input video stream to a shorter output videostream. This can be applied by computing an optimal per-pixel timewarping via a linear system of equations. Each pixel will be mapped to atime-location in the output video that is similar to that of its spatialneighbors. Important pixels are to be mapped to locations in timedistinct from their time-line neighbors. Each frame in the output videois assembled using several input frames, such that moving objects do notoverlap.

Data Reduction

According to an embodiment of the invention the conversion process canbe simplified, and additionally or alternatively, applied on multipleframes at once, by reducing the amount of information that is processedduring the conversion process. For example, after the importance of eachpixel of a frame is calculated, a smaller information set can be usedwhen calculating the conversion process. The smaller information set caninclude multiple variables, each representative of an importance ofmultiple pixels, it can include only importance information of a sub setof the pixels, and the like. The data reduction can be implemented byvarious mathematical manners such as but not limited to averaging,quantizing, sub-set selection and the like.

For example, assuming that an input saliency matrix (which includesinformation about all pixels of the frame) has G elements, then areduced matrix can include fewer elements (for example G/R). After thesmaller matrix is used during the conversion process the results areup-scaled, for example by using a bilinear filter, a Bi-cubic filter,and the like.

Group of Frame Processing

According to an embodiment of the invention the conversion process canbe applied on a group of frames. This group can form a shot or a portionof the shot. When the conversion process is applied on frames of asequence of consecutive frames then the time smoothness can be furtherimproved as a frame is processed not just in relation to a previousframe but also in relation to one or more following frames.

Conveniently, a single conversion process can be applied on a group offrames after performing data reduction on each frame, but this is notnecessarily so and depends upon the number of frames, the computationalresources and memory resources that can be used during the conversionprocess and timing requirements as it can be harder to provide real timeprocessing or even almost real time processing on a group of frames.

For example, assuming that the group of images form a shot theprocessing can include: (i) calculating the saliency of every frame inthe shot; resize each saliency matrix (Width×Height) to a reduced matrix(Width×{Height/ReductionFactor}) using bilinear/bicubic interpolation,(ii) generate a combined saliency matric that includes the differentreduced matrices)—for example by concatenate the different reducedsaliency matrices one after the other to provide, wherein the size ofthe combined saliency matric has the following dimensions:Width×(Height*NumberOfMatrices/Reduction Factor); (iii) calculating theoptimization matrix with various constraints, such as: (iii.a)X(i,j,t)−X(i,j+1,t)=1; (iii.b) X(i,j,t)−X(i+1,j,t)=0; (iii.c)X(i,j,t)−X(i,j,t+1)=0; (iii.d) X(i,1,1)=1; (iii.e) X(i,Width,NumberOfFrames)=TargetWidth; (iv) adding weights; (v) solving thelinear system; and (vi) mapping each frame using the upscale solution.

Panning

Panning includes emulating a movement of a camera, such as a horizontalmovement or rotation. Panning can be introduced when the conversionprocess is applied on a group of frames. In this case the panning can berepresented by selecting portions of a larger frame, wherein theselection of portions provides a panning effect. In this case theconversion process can include mapping pixels of an input frame (thatits location within the larger frame changes over time) to an outputframe.

Conveniently, these mentioned above varying boundaries are includes inthe set of constraints that are solved by the conversion process.

For example, assume that the variable Pan_t is the horizontal panning offrame t. It differs over time to provide the panning effect. Then theconversion process should take into account the following constraints:(ia) X(i,1,t)−Pan_t=1; (ib) X(i,n,t)−Pan_t=new_width; (ic)X(i,j,t)+Pan_t−X(i,j,t+1)−Pan_t+1=0//times some weighting (I use 0.11);(ii) Pan_t−Pan_{t+1}=0; //times some weighting (I use 0.000001); and(iii) Pan_1=0

Under these constraints the linear system is solved. For each frame asolution-matrix is provided and Pan_t can be subtracted from it. (iv)the solution matrix is upscaled to the original frame size. (v) Theframe is remapped.

A device is provided. The device can include hardware, software and/orfirmware.

FIG. 1a illustrates device 200 according to an embodiment of theinvention. Device 200 includes a processor 210 and memory unit 220. Itcan be connected to a display or a printer or can include at least oneof these components. Memory unit 220 is adapted to store an input imageand processor 210 is adapted to: determine an importance value for eachinput pixels out of multiple input pixels of an input image and apply oneach of the multiple input pixels a conversion process that isresponsive to the importance value of the input pixel to providemultiple output pixels that form the output image; wherein the inputimage differs from the output image. Processor 210 can execute code thatis stored in a computer readable medium.

Conveniently, processor 210 includes a local saliency detection module212, face detection module 214, motion detection module 216 and mappingoptimizing module 218. These modules cooperate in order to provide anoutput image. It is noted that processor 210 can work in an in-linemanner, in a partially off-line manner or entirely in an off-linemanner. It is further notes that various objects of interest can bedetected by processor 210, in addition to or instead of faces. Eachmodule can include software, hardware, firmware or a combinationthereof.

Local saliency module 212 calculates local saliency values of pixels.Face detection module 214 detects faces. Motion detection module 216detects motion. Mapping optimizing module 218 applies the conversionprocess.

Conveniently, processor 210 is adapted to perform at least one of thefollowing or a combination thereof: (i) determine an importance of aninput pixel in response to an importance input pixel mask. The mask canbe defined by a user; (ii) determine an importance of an input pixel inresponse to motion associated with each of the multiple input pixels;(iii) determine an importance of an input pixel in response to asaliency score of the input pixels; (iv) determine an importance of aninput pixel in response to an inclusion of an input pixel within aninput image that represents a face of a person and/or within an objectof interest. The object of interest is predefined and can depend uponthe expected content of the image. For example, when viewing sportevents the ball can be defined as an object of interest; (v) generate anoutput image such that a distance between output image representationsof a pair of adjacent input pixels is responsive to an importance of atleast one of pair of adjacent input pixels. Thus, for example, a pair ofimportant input image pixels can be mapped to a pair of output imagepixels while less important pair of input pixels can be mapped to thesame pixel or be mapped to output pixels whereas the distance betweentheir output image representations is less than a pixel; (vi) at leastpartially compensate for camera induced motion. This motion can resultfrom zoom-in, zoom-out, camera rotation, and the like; (vii) apply anoptimal mapping between the input image (original frame or source frame)to the output image (retargeted image); (viii) solve a set of sparselinear equations; (ix) apply a conversion process in response to atleast one of the following constraints: each input pixel is mapped to anoutput pixel that is located at substantially a fixed distance from itsleft and right neighbors; each input pixel is mapped to an output pixellocated to substantially a similar location to which upper and lowerinput pixels are mapped; an input pixel is mapped to an output pixellocated substantially at a same location as an output pixel to which thesame input pixel at a previous image was mapped; and size and shape ofthe output image; (x) perform re-sizing (down-sizing, up-sizing,warping, and the like); (xi) alter an aspect ratio.

The processor is adapted to perform at least one of the mentioned aboveoperations by executing code. It is noted that the adaptation involveproviding hardware circuitries that can assist in executing one or moreof the mentioned above stages. The hardware can include memorycircuitry, logic circuitry, filters, and the like.

Conveniently, the input image belongs to an input image sequence andprocessor 210 is adapted to apply a conversion process in response to arelationship between the input image and at least one other input of theinput image sequence.

Processor 210 can execute at least one stage of methods 100 or 300 or acombination thereof. It can, for example, perform data reduction,wavelet decomposition, group of frames processing, and panning.

FIG. 1b illustrates device 201 according to an embodiment of theinvention. Device 201 differs from device 200 by including processor 211that also include a data reduction module 219 and a waveletdecomposition module 217. It is noted that a processor can include onlyone of these modules. The data reduction and wavelet decomposition stageare further illustrated in FIGS. 2 and 3.

FIG. 2 illustrates method 100 according to an embodiment of theinvention. Method 100 starts by stage 110 of determining an interestvalue for each input pixels out of multiple input pixels of an inputimage. Conveniently, stage 110 includes stage 111 of applying a waveletdecomposition in order to provide a local saliency of each pixel. Stage111 is illustrated in FIG. 3. Stage 111 can include at least one of thefollowing stages: (i) stage 111(1) of decomposing a frame into multiple(N) levels; (ii) stage 111(2) of locating the coarsest diagonal highfrequency frame in which the percentage of wavelet coefficients havingvalues below a first threshold is below a second threshold; (iii) stage111(3) of thresholding the coarsest diagonal high frequency frame (usingthe first threshold) to provide a binary frame; (iv) stage 111(4) ofre-sizing the binary frame to the size of the input image; (v) and stage111(5) of smoothing the re-sized binary frame (for example by applying aGaussian blur filter) to provide a saliency score per pixel of the inputframe.

Stage 110 can include at least one of the following: (i) stage 112 ofdetermining a importance input pixel mask, (ii) stage 113 of determiningan importance of an input pixel in response to motion associated witheach of the multiple input pixels; the determination can includeassigning a binary motion base saliency score to an input pixel orassigning a non-binary motion base saliency score of a pixel in responseto the amount of motion; (iii) stage 114 of determining an importance ofan input pixel in response to a saliency score of the input pixels; (iv)stage 115 of determining an importance of an input pixel in response toan inclusion of an input pixel within a an input image that represents aface of a person.

Stage 110 is followed by stage 120 of applying on each of the multipleinput pixels a conversion process that is responsive to the interestvalue of the input pixel to provide multiple output pixels that form theoutput image; wherein the input image differs from the output image.

Stage 120 can be preceded by stage 119 of performing a data reductionstage. The performing can include representing each set of pixels by asingle variable, ignoring pixel importance information, and the like.FIG. 1 illustrates stage 119 as being included in stage 110.

FIG. 4 illustrates various stages that can be included in stage 120. Itis noted that some of the stages are overlapping. Conveniently, stage120 can include at least one of the following stages or a combinationthereof: (i) stage 121 of determining a distance between output imagerepresentations of a pair of adjacent input pixels is responsive to animportance of at least one of pair of adjacent input pixels. Thedistance can range from one or more pixels to a portion of a pixel; (ii)stage 122 of at least partially compensating for camera induced motion;(iii) stage 123 of applying an optimal mapping between the input imageto the output image; (iv) stage 124 of solving a set of sparse linearequations; (v) stage 125 of applying a conversion process that isresponsive to at least one of the following constraints: each inputpixel is mapped to an output pixel that is located at substantially afixed distance from its left and right neighbors; each input pixel ismapped to an output pixel located to substantially a similar location towhich upper and lower input pixels are mapped; an input pixel is mappedto an output pixel located substantially at a same location as an outputpixel to which the same input pixel at a previous image was mapped; andsize and shape of the output image; (vi) stage 126 of re-sizing; (v)stage 127 of altering an aspect ratio.

Conveniently, the input image belongs to an input image sequence andwherein the applying is responsive to a relationship between the inputimage and at least one other input of the input image sequence.

FIG. 5 illustrates method 300 according to an embodiment of theinvention.

Method 300 differs from method 100 by the processing of a group of inputimages.

Method 300 starts by stage 310 of determining an interest value for eachinput pixels out of multiple input pixels of the group of input images.These multiple input images can form a shot or a portion of a shot.

Stage 310 can resemble stage 110 but it is applied on pixels of a groupof images. It can be applied on these pixels simultaneously.

Stage 310 can include stages that are analogue to stages 111-115 and119. For example, stage 310 can include applying a data reduction stageto provide results of a data reduction stage. It this case stage 320will include applying on each of the results of the data reduction stagea conversion process that is responsive to the importance value of theresults to provide converted results.

Stage 310 is followed by stage 320 of applying on each of the multipleinput pixels of the group of images a conversion process that isresponsive to the interest value of the input pixel to provide multipleoutput pixels that form the output image; wherein the input imagediffers from the output image. The conversion process of a certain imagecan be responsive to one or more frames that precede this certain imageand to one or more mages that follow this certain image.

Stage 320 can include stages that are analogue to stages 121-127. It canalso include stage 328 of applying the conversion process on elements ofa combined saliency matrix, wherein the combined saliency matrixincludes multiple saliency matrices, each representative of a saliencyof multiple pixels of a single input image out of the group of inputimages.

Stages 310 and 320 can be executed in a manner that generates a panningeffect. Assuming that a sequence of K images (IA(1) . . . IA(K)) form ashot, that the panning effect includes a movement of the camera fromleft to right, and that portions of size P×Q pixels should be regardedas the input image. In this case the first input image will include theleftmost P*Q pixels of IA(1), the second input image will include aslightly shifted portion of P×Q pixels of IA(2) and the K'th input imageshould include the rightmost P*Q pixels of IA(K). The pixels that belongto these input images can be processed by applying stages 310 and 320 toprovide a panning effect.

Sample Images

FIG. 12a is an original image. FIG. 12b is a half width retargetedoutput image generated by applying method 100. FIG. 12c is a half widthretargeted output image generated by applying a prior art method. FIG.12d illustrates an original image (leftmost image) that undergoessaliency processing, face detection and motion detection, anoptimization process (conversion process) and an output frame.

FIG. 13a is an input image while FIG. 13b is a retargeted frame (halfwidth) generated by applying the method 100. It is noted that a croppedwindow cannot achieve the same frame area utilization.

FIG. 14a is an input frame, FIG. 14b is a gradient map of the inputframe with the faces detected imposed, FIG. 14c is a result of aretargeting to half the width without face detection and FIG. 14d is aresult of retargeting with face detection.

FIG. 15a is an input image taken from the MPEG/ITU-T committeebench-mark video “football”. FIG. 15b is a saliency map that includesthe motion. FIG. 15c is a result of bi-cubic interpolation to half thewidth. FIG. 15d is an output image generated by retargeting withoutmotion based saliency and FIG. 15e is the result of retargeting with thefull saliency map.

The top row of FIGS. 16a-16c depicts the retargeting results of acertain frame while the top row of FIGS. 16a-16c depicts the retargetingresults of another image. FIG. 16a illustrates a result of a bi-cubicinterpolation, FIG. 16b illustrates the result of frame by frameretargeting and FIG. 16c illustrates a time smoothed retargeting. Timesmoothing prevents the video from “jumping around”.

Each of FIG. 17a-17c illustrates an original frame (bottom of eachfigure), a result of a bi-cubic interpolation (top-right) and a resultof applying the method 100 (top-left). The latter retargeting methodprevents much of the thinning effect caused by the rescaling andpreserves details.

Each of FIG. 18a-18c illustrates a result of a bi-cubic interpolation(bottom) and a result of applying the method 100 (top). The latterretargeting method applies a non-homogenous zooms to the objects ofinterest.

FIG. 19a illustrates an input frame while FIG. 19b is a two fold widerretargeted output image generated by applying the method 100.

APPENDIX A

APPENDIX A illustrates a method and system for providing an outputimage. Especially, a method and device for inhomogeneous 2D texturemapping guided by a feature mask is provided. The mapping can apply oneor more conversion processes that are responsive to the feature mask.The mapping preserves some regions of the image, such as foregroundobjects or other prominent parts. The method is also referred to as themethod illustrated in appendix A. This method includes receiving aninput frame and a feature mask defined by a rough selection of thefeatures interest map and mapping them (by solving a sparse equationset) to an output frame. If a rigid transformation (rigid mapping) isapplied then the featured indicated in the feature mask can undergo(during the mapping) a similarity transformation, possibly at theexpense of the background regions in the texture that are allowed todeform more. If a similarity transformation (similarity mapping) isapplied then the size of a feature can be slightly changed.

Appendix A illustrates a method for providing an output image, themethod includes: receiving an input frame and a feature mask defined bya rough selection of the features; applying a mapping process to providethe output image; wherein the mapping process differentiates betweenpixels of features included in the feature mask and between otherpixels; wherein the applying comprises solving a sparse equation set.

Conveniently, the mapping process applies a similarity transformation onpixels of features included in the feature mask.

Conveniently, the mapping process allows pixels of features included inthe feature mask to slightly change.

Appendix A illustrates a device for providing an output image, thedevice includes: a memory unit adapted to store an input image and afeature mask defined by a rough selection of the features; and aprocessor, adapted to: apply a mapping process to provide the outputimage; wherein the mapping process differentiates between pixels offeatures included in the feature mask and between other pixels; whereinthe applying comprises solving a sparse equation set.

Conveniently, the processor applies a similarity transformation onpixels of features included in the feature mask.

Conveniently, the processor applies allows pixels of features includedin the feature mask to slightly change.

Appendix A illustrates a computer readable medium that storesinstructions for: receiving an input frame and a feature mask defined bya rough selection of the features; applying a mapping process to providethe output image; wherein the mapping process differentiates betweenpixels of features included in the feature mask and between otherpixels; wherein the applying comprises solving a sparse equation set.

Conveniently, the computer readable medium stores instructions forapplying a similarity transformation on pixels of features included inthe feature mask.

Conveniently, the computer readable medium stores instructions forallowing pixels of features included in the feature mask to slightlychange.

Instead of cropping the frames, the device and method shrink them whilerespecting the salient regions and maintaining the user experience. Theproposed device and method are efficient and the optimization stageincludes of solving a sparse N×N system, where N is the number of pixelsin each frame. The method and device are well adapted to batchapplications, but are designed for streaming video since they computethe warp of a given frame based on a smalltime-neighborhood only, andare is fast enough to avoid delays. It is noted that the method andsystem can also perform up-scaling.

The method and device can be applied to solve several retargeting tasks:

video down/up-sampling, aspect ratio alterations, and non-homogenousvideo expansion, video abstraction, object removal from a video, andobject insertion to video while respecting the saliency. It is notedthat object removal is done by zeroing the saliency measure of theobject while object insertion is implemented by placing a new blob ofpixels in between existing image pixels and setting it the importance ofthe new pixels to a large value.

The method of appendix A does not distort the regions of interest.

The method of appendix A and the system are able to arbitrarily warp agiven image while preserving the shape of its features by constrainingtheir deformation to be a similarity transformation.

In particular, the method and system allow global or local changes tothe aspect ratio of the texture without causing undesirable shearing tothe features. The algorithmic core of the method and system is aparticular formulation of the Laplacian editing technique, suited toaccommodate similarity constraints on parts of the domain.

The method illustrated in appendix A is useful in digital imaging,texture design and any other applications involving image warping, whereparts of the image have high familiarity and should retain their shapeafter modification.

In 2D texture mapping applications, images are mapped onto arbitrary 2Dshapes to create various special effects; the texture mapping isessentially a warp of the texture image, with constraints on the shapeof the boundary or possibly the interior of the image as well. Suchtexture mapping is common in graphical design and publishing tools, aswell as 2D and 3D modeling and animation applications. Commercial designtools usually provide a library of predefined warps, where the user onlyneeds to select the desired mapping type and possibly tune a fewparameters. Another option is to interactively design the texture map byselecting and transforming points or curves on the original image; themapping is computed so as to accommodate such user constraints. It isalso possible to apply free-form deformations with grid-based controls.

Texture manipulation in 2D is commonly applied by modelers whentexturing 3D models: the texture map often needs to be adjusted andaligned to match particular features of the 3D surface. Constrainedtexture mapping methods have been developed for this purpose where theuser supplies point correspondences between the texture and the 3Dmodel, and a suitable mapping is computed automatically.

Most image mapping and manipulation techniques treat the entire textureimage homogeneously. When the deformation applied to an image introducesshearing, e.g. in the simplest situation where the aspect ratio of animage is altered by non-uniform scaling, all the image features aredistorted. This may be disturbing when the image contains features withhighly familiar shape, such as humans, animals, prominent geometricobjects, etc. A typical example of a simple image transformation isshown in FIG. 12a, where the shear and stretch effects distort theimages of the children in a quite unsatisfactory manner.

The method illustrated in appendix A is capable of preserving the shapeof masked regions of the texture while warping the image according tothe user specifications. This feature-aware texture mapping is guided bya feature mask defined by a rough selection of the features; in themapping result, these features will undergo solely a similaritytransformation, possibly at the expense of the background regions in thetexture that are allowed to deform more. This method can relate to thetexture optimization techniques of Balmelli et al., where the texturemap is warped to allow higher pixel budget for the high-frequencydetails of the texture image.

At a first glance, it seems that a feature-preserving mapping could beachieved by cutting out the features, warping the rest of the image asdesired and then pasting the features back and adjusting theirorientation and scale. However, this poses several difficulties: (i)precise segmentation of the features with correct alpha-mattes forsubsequent seamless compositing is required; (ii) it is not clear how toprescribe the similarity transformation of the features; (iii) texturesynthesis heeds to be applied for the holes that are likely to formaround the features; alternatively, the pasted features could overlapwith parts of the warped texture, causing information loss. The abovetasks are quite complex; moreover, the tuning of such an algorithm wouldrequire significant amount of user interaction. In contrast, our methoddoes not require a highly accurate matte but rather a loose selection ofthe features, which can be done using standard selection tools. Themethod illustrated in appendix A produces coherent, smooth image warpsby drawing upon the recent machinery of differential representations anddeformation techniques.

Feature-Aware Mapping

The suggested feature-preserving texture mapping technique is firstdescribed assuming that an input warping function W: R²→R² is given.Assume that the input image is represented by a regular pixel grid ofdimensions m×n. The grid of the input image is denoted by G=(V,E,K),where V={v₁, v₂, . . . , v_(N)} is the set of node positions (N=mn),E={(i, j)} is the set of directed edges between the nodes and K is theset of quad faces of the grid. Throughout the discussion it is assumedthat G is a 4-connected quad grid, although the algorithm can be easilyextended to any general meshing of the image. It is assumed that thevalues of the input mapping W on all the grid nodes v_(i) are known.

The user provides a feature mask that marks the parts of the image whoseshape should be preserved. The mask is denoted by M={m₁, . . . , m_(N)},such that m_(i)=1 if pixel i belongs to a feature and m_(i)=0 otherwise.The feature nodes indices are thus F={i s.t. m_(i)=1}. The method 100partitions F into its connected components: F=F₁∪F₂∪ . . . ∪F_(d) (seeFIG. 2(e)). The method of appendix A aims to find a mapping of theoriginal grid G that is as close as possible to the input warp W andrespects the shape of the features specified by the mask M. It isdesired to preserve the shape of all the quads contained in thefeatures, meaning that they should undergo solely a similarity or rigidtransformation. Rigid transformation implies that the size of thefeatures will be preserved, whereas a similarity transformation allowsvarying the size according to the warping function W. The user can beleft with the choice between rigid and similarity behavior.

A proper shape preserving transformation for each quad Q=(v_(i) ₁ ,v_(i)₂ ,v_(i) ₃ ,v_(i) ₄ ) that has at least one node in F is provided. W(Q)is approximated with a rotation/similarity transformation, by taking thelinear component of W and extracting the rotation from it by means ofthe polar decomposition.

Specifically, denote W(Q)=(v_(i) ₁ ′,v_(i) ₂ ′,v_(i) ₃ ′,v_(i) ₄ ′);denote by

$v = {\frac{1}{4}{\sum\limits_{k = 1}^{4}\; v_{i_{k}}}}$the centroid of Q; the centered vertices are then u_(i) _(k) =v_(i) _(k)−v (and similarly, u_(i) _(k) ′ or W(Q)). The method can linearlyapproximate the homogeneous part of W on Q by:T_(W,Q)=[u_(i) ₁ ′u_(i) ₂ ′u_(i) ₃ ′u_(i) ₄ ′]·[u_(i) ₁ u_(i) ₂ u_(i) ₃u_(i) ₄ ]*,  (10)

where A* denotes the pseudoinverse of matrix A. In fact, T_(W,Q) is anapproximation of the Jacobian of W on Q; if given the analyticalexpression of W, T_(W,Q) can be replaced by the Jacobian of W at, say,v_(i) ₁ .

To extract the rigid component of T_(W,Q) the method performs itssingular value decomposition: T_(W,Q)=UΣV^(T); the rigid component ofT_(W,Q) is thenR_(W,Q)=VU^(T).  (11)

To devise the feature-preserving mapping, the method formulates thefollowing optimization problem: it would be desired that all theelements outside of F to undergo a transformation as close as possibleto W, and all the elements in F should undergo solely the rigid (orsimilarity) component of W. It is convenient to formulate therequirements of this optimization per quad. If quad Q=(v_(i) ₁ ,v_(i) ₂,v_(i) ₃ ,v_(i) ₄ ) belongs to a feature (i.e. it has at least one nodein F), the method defines the following four equations related to itsfour edges:{tilde over (v)}_(i) _(k+1) −{tilde over (v)}_(i) _(k) =R_(W,Q)(v_(i)_(k+1) )−R_(W,Q)(v_(i) _(k) ),k=1, . . . ,4 cyclically  (12)where {tilde over (v)}_(i) _(k) are the unknown deformed grid nodes.Similarly, if Q does not belong to a feature, we add the following fourequations for its edges:{tilde over (v)}_(i) _(k+1) −{tilde over (v)}_(i) _(k) =W(v_(i) _(k+1))−W(v_(i) _(k) ),k=1, . . . ,4 cyclically  (13)

Overall, the method of appendix A obtains an over-determined system of4|K| equations in 2N unknowns, which can be solved in the least squaressense. Note that the system is separable in the two coordinates, thus wecan solve for x and y separately, with the system matrix containing Ncolumns. The method can constrain the boundary nodes to their positionsunder W to make the optimization problem well-posed:{tilde over (v)}_(i)=W(v_(i)),∀iϵ∂G.  (14)

Solving for {tilde over (v)}₁, . . . , {tilde over (v)}_(N) will providea mapping that rigidly preserves the features, including their size. Toobtain a shape-preserving mapping that allows appropriate scaling of thefeatures, the method can modify the local transformations R_(W,Q) asfollows.

The method estimates that the average scaling of each connected featurecomponent F_(i) under W by observing the singular values of thetransformations T_(W,Q). For each element QϵF_(i), the method takes thesmaller singular value of T_(W,Q), and average those values over allQϵF_(i), obtaining the average scale factor λ_(i). Conveniently, thesmaller singular values are averaged, because intuitively, if the imageis stretched in one direction, the feature size should remain constant.The target local transformations of the quads in each F_(i) are thusupdated to be λ_(i)R_(W,Q), and Eq. ((2)) is modified accordingly.

Smoothing the Mapping

When the input warp W is largely deforming the geometry of G, featureshape preservation may be compromised. To compensate for suchsituations, it is useful to apply weights to Eq. ((2)) that isresponsible for feature preservation: each side of those equations ismultiplied by weight w, (a sample value of W_(F)=10). Since aleast-squares system of equations is solved, this multiplication resultsin w_(F) ²-magnification of the corresponding error terms in theminimization functional, forcing the optimization to respect thefeatures more, at the expense of larger deformation of other areas.

However, since the weights are abruptly discontinuous at the featureboundaries (weighting of 1 outside the feature and w_(f)?1 inside), suchsolution damages the smoothness of the mapping near the featureboundary. This can be easily corrected by assigning a more smoothweighting function: computing a local distance field to the feature andassigning smoothly decreasing weights for the quads in the vicinity ofthe feature as functions of the distance field. The equations associatedwith those “transition-quads” are of type ((2)).

The following polynomial can be used as the decay function:

${{f(x)} = {{\frac{2}{\rho^{3}}x^{3}} - {\frac{3}{\rho^{2}}x^{2}} + 1}},$(15) where the constant ρ>0 controls the extent of the decay; theweights in the intermediate region around the feature boundaries arethus defined as:w(Q)=w_(F)·f(D(Q))+1·(1−f(D(Q))),  (16)where D(Q) is the value of the distance to the feature at the center ofQ. The decay radius p is set to be the width of two grid cells; outsideof this radius the weights are set to 1.Interactive Texture Mapping

Two possible modes of texturing application are differentiated from eachother: input-warp mode (described in the previous section) andinteractive mode. In both modes, the feature regions of the input imageare first specified by a feature mask. In the interactive mode, the userdesigns the mapping using the standard controls of image boundaryediting and/or prescription of inner curve transformations. The mappingis computed taking into account these user-defined constraints and thefeature mask, using a deformation technique based on differentialcoordinates.

These user's manipulations are interpreted by the system as positionalconstraints on the grid nodes, i.e. simply{tilde over (v)}_(i)=c_(i),iϵU,  (17)where U is the set of the nodes constrained by the user and c_(i) arethe new positions for those nodes.

The mapping of the free grid nodes is decided by applying the Laplacianediting optimization. The goal of this optimization is to create asmooth and as-rigid-as-possible mapping of the grid shape that respectsthe user constraints ((17)).

“As-rigid-as-possible” means that if the user-constraints imply solely arigid (or similarity) transformation of the grid shape, the optimizationtechnique indeed delivers such transformation; otherwise, theoptimization finds a mapping that is locally as close as possible tobeing rigid, which is perceived as an intuitive result. The optimizationinvolves solving a sparse linear system of size 2N×2N.

Once the mapping function W is established in the above manner, itsfeature-preserving approximation is created according to the featuremask, as described in Section “Feature-aware mapping” above.

Sample Implementation Details

Size setup Factor Rhs setup Solve  50 × 100 0.156 0.110 0.015 0 100 ×100 0.375 0.250 0.031 0.015 100 × 200 1.141 0.562 0.047 0.031 200 × 2002.171 1.407 0.109 0.063

Table 1 illustrates timing statistics (in seconds) for the differentparts of the mapping algorithm. Sys. setup stands for the setup of thenormal equations matrix; Rhs setup denotes the building the right-handside of the normal equations and Solve stands for the back-substitution.Note that the system setup and matrix factorization is done in apre-process, once per given image grid.

The algorithmic core of the feature-sensitive texture mapping is thesolution of the least-squares optimization expressed by Eqs. ((2)-(3))and ((14)).

When put together, these equations form an over-determined linear systemof the form:A(xy)=(b_(x)b_(y)),  (18)where x=({tilde over (x)}₁, . . . , {tilde over (x)}_(N))^(T) are the xcoordinates of the deformed grid and y=({tilde over (y)}₁, . . . ,{tilde over (y)}_(N))^(T) are the y coordinates.

The system is separable in the two coordinates, so the system matrix Ahas N columns. The matrix is very sparse since there are only twonon-zero coefficients in each row. The system is solved by factoring thenormal equations:A^(T)A(xy)=A^(T)(b_(x)b_(y)).  (19)

The Taucs library is used for efficient sparse matrix solvers. Choleskyfactorization provides a sparse lower-triangular matrix L such thatA^(T)A=LL^(T).  (20)

Then, the equations can solved by double back substitution:Lx_(temp)=A^(T)b_(x)L^(T)x=x_(temp),  (21)and in the same fashion for the y component. Thus, a singlefactorization serves solving for multiple right-hand sides.

The construction of the A matrix, the normal equations matrix and thefactorization can be attributed to the pre-process, since they onlydepend on the grid and the feature map of the input image; the matrixfactorization is the most computationally-intensive part, taking a fewseconds for grids with several tens of thousands of quads. Once thefactorization is computed, back substitution is extremely fast (seeTable 1).

When varying the input warp function W, there is only need to update theright-hand side of the system (the b_(x),b_(y) vectors) and performback-substitution, so the user can experiment with various mappings inreal time. Of course, manipulation of very large images may slow downdue to the large dimensions of the system matrix; to maintaininteractive response in this case the grid is defined to be slightlycoarser than the pixel grid of the input image, so that the size of thesystem remains in the order of 20000-50000 variables. For example, itcan efficiently handle an image of 1000×1000 pixels by defining the sizeof the grid cells to be 5×5 pixels.

Computing the initial mapping by interactively-placed user constrains(Section “Interactive texture mapping”) also requires solving a sparselinear system of size 2N×2N. It is done in the same manner pre-factoringthe system matrix and solely varying the right-hand side of the systemwhen the user manipulates the boundary constraints. Since theback-substitution is fast, the manipulation is interactive, asdemonstrated in the accompanying video.

The mentioned above feature-sensitive texturing system on a Pentium 43.2 GHz computer with 2 GB RAM. It was assumed that the feature maskcomes together with the input image, defined in some external imageediting software. During the experiments the feature maps were createdby Photoshop using the standard selection tools (Magic Wand, Lasso andMagnetic Lasso). The process of feature selection is quite easy sincethe feature-aware texturing needs only a rough binary matte.

The inventor experimented with various input warping functions that arecommonly available in most image editing packages. The results ofunconstrained mapping with the mentioned above feature-preservingmapping were compared in various figures. It can be clearly seen in allthe examples that the mentioned above mapping preserves the shape of thefeatures while gracefully mimicking the input mapping function. Thesimilarity-preserving mapping allows uniform scaling of the features,and thus it has more freedom to approximate the input mapping. Forinstance, when the input mapping implies enlargement of the image, thesimilarity-preserving mapping will allow uniform scaling of thefeatures, whereas the rigid mapping will constrain the features toremain in their original size, thus introducing more stretch to thebackground areas.

In extreme deformation cases, the feature-aware mapping may introducefold-overs, which may result in texture discontinuity. Preventingself-intersections within the least-squares optimization is quitedifficult; it is noted that the method can be adapted to performpost-processing relaxations to fix the fold-overs.

Sample Images

FIGS. 6a-6b, 7a-7h, 8a-8c, 9a-9d, 10a-10f, 11a-11f illustrate thedifferences between applying prior art mapping method and applying themethod illustrated in appendix A.

FIG. 6a is the result of applying a prior art mapping process on animage while FIG. 6b is a result of applying the method of appendix A. Itis noted that in FIG. 6a the legs of the children are squeezed and theirheads are stretched. This affects do not appear in FIG. 6b.

FIG. 7a is an original image. FIGS. 7b, 7c and 7d are the result ofapplying a prior art mapping process on an image so as to map the imageonto a vertically stretched output frame, onto an arc shaped outputframe and onto a sinusoidal output frame. FIG. 7e illustrates a featuremap (of features F1-F11) that is generated by applying the methodillustrated in appendix A. FIGS. 7f, 7g and 7h are the result ofapplying the method illustrated in appendix A so as to map an image ontoa vertically stretched output frame, onto an arc shaped output frame andonto a sinusoidal output frame.

FIG. 8a is an original image. FIG. 8b illustrates a result of applyingthe method illustrated in appendix A so as to map an image ontovertically (×2) stretched output frame. FIG. 8c illustrates anunderlying grid.

FIG. 9a is an original image. FIG. 9b is the result of applying a priorart mapping process on an image so as to map the image onto a swirlfeature the method illustrated in appendix A method according to anembodiment of the invention so as to map the image onto a swirl featureoutput frame while constraining the size of features. FIG. 9dillustrates a result of applying the method illustrated in appendix A soas to map the image onto a swirl feature output frame while allowinguniform scaling of the features.

FIG. 10a is an original image. FIG. 10b is the result of applying aprior art mapping process on an image so as to map the image onto threedimensional shape. FIG. 10c illustrates the result of applying themethod illustrated in appendix A on an image so as to map the image ontothree dimensional shape.

FIG. 10d is an original image. FIG. 10e is the result of applying aprior art mapping process on an image so as to map the image ontoanother three dimensional shape. FIG. 10f illustrates the result ofapplying the method illustrated in appendix A on an image so as to mapthe image onto another three dimensional shape.

FIG. 11a is an original image. FIG. 11b is the result of applying aprior art mapping process on an image so as to map the image onto aswirl feature output frame. FIG. 11c illustrates a result of applyingthe method illustrated in appendix A so as to map the image onto a swirlfeature output frame while constraining the size of features. FIG. 11dillustrates a result of applying the method illustrated in appendix A soas to map the image onto a swirl feature output frame while allowinguniform scaling of the features.

FIG. 11e is an original image. FIG. 11f is the result of applying aprior art mapping process on an image so as to map the image onto an arcshaped output frame. FIG. 11g illustrates a result of applying themethod illustrated in appendix A so as to map the image onto an arcshaped output frame while constraining the size of features. FIG. 11hillustrates a result of applying the method illustrated in appendix A soas to map the image onto an arc shaped output frame while allowinguniform scaling of the features.

FIG. 11i is an original image. FIG. 11j is the result of applying aprior art mapping process on an image so as to map the image onto acircular shaped output frame. FIG. 11k illustrates a result of applyingthe method illustrated in appendix A so as to map the image onto acircular shaped output frame while constraining the size of features.FIG. 11l illustrates a result of applying the method illustrated inappendix A so as to map the image onto a circular shaped output framewhile allowing uniform scaling of the features.

Variations, modifications, and other implementations of what isdescribed herein will occur to those of ordinary skill in the artwithout departing from the spirit and the scope of the invention asclaimed. Accordingly, the invention is to be defined not by thepreceding illustrative description but instead by the spirit and scopeof the following claims.

We claim:
 1. A method for providing an output image, the methodcomprising: determining an importance value for each input pixels out ofmultiple input pixels of an input image; applying on each of themultiple input pixels a conversion process that is responsive to theimportance value of the input pixel to provide multiple output pixelsthat form the output image; wherein the input image differs from theoutput image; wherein the determining is responsive to motion associatedwith each of the multiple input pixels.
 2. The method according to claim1 wherein the determining is responsive to a saliency score of the inputpixels.
 3. The method according to claim 1 wherein the determining isresponsive to an inclusion of an input pixel within an input image areathat represents a face of a person.
 4. The method according to claim 1wherein the determining is responsive to an inclusion of an input pixelwithin an input image area that represents an object of interest.
 5. Amethod for providing an output image, the method comprising: determiningan importance value for each input pixels pixel out of multiple inputpixels of an input image, wherein the importance value is at least basedon a saliency score, the saliency score based on at least one of:spatial constraints, object detection and/or motion detection for eachof the multiple input pixels; applying on each of the multiple inputpixels a conversion process that is responsive to the importance valueof the input pixel to provide multiple output pixels that form theoutput image; wherein the input image differs from the output image;wherein a distance between output image representations of a pair ofadjacent input pixels is responsive to an importance of at least one ofpair of adjacent input pixels; and outputting the output image.
 6. Amethod for providing an output image, the method comprising: determiningan importance value for each input pixels out of multiple input pixelsof an input image; applying on each of the multiple input pixels aconversion process that is responsive to the importance value of theinput pixel to provide multiple output pixels that form the outputimage; wherein the input image differs from the output image; whereinthe applying is responsive to at least one of the following constraints:each input pixel is mapped to an output pixel that is located atsubstantially a fixed distance from its left and right neighbors; eachinput pixel is mapped to an output pixel located to substantially asimilar location to which upper and lower input pixels are mapped; aninput pixel is mapped to an output pixel located substantially at a samelocation as an output pixel to which the same input pixel at a previousimage was mapped; and size and shape of the output image.
 7. A methodfor providing an output image, the method comprising: determining animportance value for each input pixels out of multiple input pixels ofan input image; applying on each of the multiple input pixels aconversion process that is responsive to the importance value of theinput pixel to provide multiple output pixels that form the outputimage; wherein the input image differs from the output image; whereinthe determining is responsive to a saliency score of the input pixels;wherein the saliency score is computed by applying a waveletdecomposition process.
 8. A method for providing an output image, themethod comprising: determining an importance value for each input pixelspixel out of multiple input pixels of an input image, wherein theimportance value is at least based on a saliency score, the saliencyscore based on at least one of: spatial constraints, object detectionand/or motion detection for each of the multiple input pixels; applyingon each of the multiple input pixels a conversion process that isresponsive to the importance value of the input pixel to providemultiple output pixels that form the output image; wherein the inputimage differs from the output image; wherein the determining isresponsive to a saliency score of the input pixels; wherein the saliencyscore is computed by locating the coarsest diagonal high frequency framein which the percentage of wavelet coefficients having values below afirst threshold is below a second threshold; and outputting the outputimage.
 9. A method for providing an output image, the method comprising:determining an importance value for each input pixels pixel out ofmultiple input pixels of an input image, wherein the importance value isat least based on a saliency score, the saliency score based on at leastone of: spatial constraints, object detection and/or motion detectionfor each of the multiple input pixels; applying on each of the multipleinput pixels a conversion process that is responsive to the importancevalue of the input pixel to provide multiple output pixels that form theoutput image; wherein the input image differs from the output image;wherein the determining is responsive to a saliency score of the inputpixels; wherein the saliency score is computed by applying a waveletdecomposition process; wherein the wavelet decomposition process isfollowed by thresholding a diagonal high frequency image to generate abinary frame; re-scaling the binary frame; and smoothing the re-scaledbinary frame, and outputting the output image.
 10. A method forproviding an output image, the method comprising: determining animportance value for each input pixels out of multiple input pixels ofan input image; applying on each of the multiple input pixels aconversion process that is responsive to the importance value of theinput pixel to provide multiple output pixels that form the outputimage; wherein the input image differs from the output image; whereinthe applying is preceded by applying a data reduction stage; andapplying on each of the results of the data reduction stage a conversionprocess that is responsive to the importance value of the results toprovide converted results.
 11. A method for providing an output image,the method comprising: determining an importance value for each inputpixels out of multiple input pixels of an input image; applying on eachof the multiple input pixels a conversion process that is responsive tothe importance value of the input pixel to provide multiple outputpixels that form the output image; wherein the input image differs fromthe output image; determining an importance value for each input pixelsout of multiple input pixels of a group of input images; and applying oneach of the multiple input pixels a conversion process that isresponsive to the importance value of the input pixel to providemultiple output pixels that form group of output images; wherein inputsimage differs from output images.
 12. A device for providing an outputimage, the device comprising: a memory unit adapted to store an inputimage and a processor, adapted to: determine an importance value foreach input pixels pixel out of multiple input pixels of an input imageand apply on each of the multiple input pixels a conversion process thatis responsive to the importance value of the input pixel to providemultiple output pixels that form the output image, wherein theimportance value is at least based on a saliency score, the saliencyscore based on at least one of: spatial constraints, object detectionand/or motion detection for each of the multiple input pixels; whereinthe input image differs from the output image; wherein the processor isadapted to determine an importance of an one or the multiple input pixelpixels in response to motion associated with each of the multiple inputpixels and output the output image.
 13. The device according to claim 12wherein the processor is adapted to determine an importance of an inputpixel in response to a saliency score of the input pixels.
 14. Thedevice according to claim 12 wherein the processor is adapted todetermine an importance of an input pixel in response to an inclusion ofan input pixel within an input image that represents a face of a person.15. A device for providing an output image, the device comprising: amemory unit adapted to store an input image and a processor, adapted to:determine an importance value for each input pixels pixel out ofmultiple input pixels of an input image and apply on each of themultiple input pixels a conversion process that is responsive to theimportance value of the input pixel to provide multiple output pixelsthat form the output image, wherein the importance value is at leastbased on a saliency score, the saliency score based on at least one of:spatial constraints, object detection and/or motion detection for eachof the multiple input pixels; wherein the input image differs from theoutput image; wherein the processor is adapted to apply a conversionprocess in response to at least one of the following constraints: eachinput pixel is mapped to an output pixel that is located atsubstantially a fixed distance from its left and right neighbors; eachinput pixel is mapped to an output pixel located to substantially asimilar location to which upper and lower input pixels are mapped; aninput pixel is mapped to an output pixel located substantially at a samelocation as an output pixel to which the same input pixel at a previousimage was mapped; and size and shape of the output image; and output theoutput image.
 16. The method of claim 5 wherein the saliency score isbased on the spatial constraints, the object detection and the motiondetection for each of the multiple input pixels.
 17. The method of claim8 wherein the saliency score is based on the spatial constraints, theobject detection and the motion detection for each of the multiple inputpixels.
 18. The method of claim 9 wherein the saliency score is based onthe spatial constraints, the object detection and the motion detectionfor each of the multiple input pixels.
 19. The device of claim 12wherein the saliency score is based on the spatial constraints, theobject detection and the motion detection for each of the multiple inputpixels.
 20. The device of claim 15 wherein the saliency score is basedon the spatial constraints, the object detection and the motiondetection for each of the multiple input pixels.